This is an R Markdown document demonstrating the analysis of the data from the Biorxiv paper “Universal Amplicon Sequencing of North Imperial Valley Wetlands Microbiomes”.

Zymo Mock Community Analysis

First we analyze how our Universal Amplicon (UA) method performs on the Zymo Mock Community by building a bar chart that compares our experimental distribution of reads to the theoretical or exepected distribution of the Zymo Mock community.

Figure 1A: Comparison of known composition for the Zymo mock community(far-right bar) compared to 28 sequencing runs using the UA primers described in this paper.  Perfect quantitation with the UA method would be an exact match of a given bar to the far-right bar

Figure 1A: Comparison of known composition for the Zymo mock community(far-right bar) compared to 28 sequencing runs using the UA primers described in this paper. Perfect quantitation with the UA method would be an exact match of a given bar to the far-right bar



Next we want to generate some box plots of the expected/theoretical distribution vs the actual distribution separated by genus.

Figure 1B: Box and whisker plots showing abundance of each organism within the Zymo mock community over all 28 runs compared to the theoretical or expected abundance of each organism within the mock community.

Figure 1B: Box and whisker plots showing abundance of each organism within the Zymo mock community over all 28 runs compared to the theoretical or expected abundance of each organism within the mock community.



Now lets combine the two figures above

Shotgun Metagenomics Comparison



To further validate the Universal Amplicon (UA) approach we collected data from the following sites around the North Imperial Valley.

Sampling Station Name Site ID Latitude Longitude Average Water Filtered (mL) Standard Deviation (mL)
Morton Bay IVF005 33.20168 -115.5971 59 15
Salton Sea Beach IVF006 33.16197 -115.6470 103 56
Alamo River IVF007 33.17667 -115.5760 162 75
Alamo River IVF017 33.19911 -115.5970 162 75
Salton Sea Obsidian Butte IVF008 33.17525 -115.6385 94 36
N MacDonald Rd IVF010 33.20634 -115.5654 141 39
Managed Marsh IVF016 33.20590 -115.5271 168 77
Managed Marsh IVF022 33.19866 -115.5276 168 77




Of those sites we selected four time points from Morton Bay to performed shotgun sequencing and uncover any potential biases with the UA in real world data.




Sample Name Sampling Station Site ID Sampling Date Extracted gDNA Concentration (ng/ul)
IVF005_2018.04.30 Morton Bay IVF005 2018-04-30 22.4
IVF005_2018.05.30 Morton Bay IVF005 2018-05-30 100.0
IVF005_2018.06.26 Morton Bay IVF005 2018-06-26 19.4
IVF005_2018.07.25 Morton Bay IVF005 2018-07-27 45.3




Figure 2: Comparison of shotgun sequencing versus UA characterization of microbial community composition for Morton Bay samples from 2018-04-30, 2018-05-30, 2018-06-26, and 2018-07-25. Heatmap representation of the fraction of sequencing reads corresponding to the given Phyla.  Our UA reads contained a significant quantity of reads that could not be classified any more specifically than the Kingdom of Fungi, and hence this kingdom appears adjacent to other phyla.<br/><br/><br/><br/>

Figure 2: Comparison of shotgun sequencing versus UA characterization of microbial community composition for Morton Bay samples from 2018-04-30, 2018-05-30, 2018-06-26, and 2018-07-25. Heatmap representation of the fraction of sequencing reads corresponding to the given Phyla. Our UA reads contained a significant quantity of reads that could not be classified any more specifically than the Kingdom of Fungi, and hence this kingdom appears adjacent to other phyla.



<br/>Figure 2v2: Comparison of shotgun sequencing versus UA characterization of microbial community composition for Morton Bay samples from 2018-04-30, 2018-05-30, 2018-06-26, and 2018-07-25 by fraction of total reads visualized as a bar plot colored by Phylum assignment


Figure 2v2: Comparison of shotgun sequencing versus UA characterization of microbial community composition for Morton Bay samples from 2018-04-30, 2018-05-30, 2018-06-26, and 2018-07-25 by fraction of total reads visualized as a bar plot colored by Phylum assignment



North Imperial Valley Community Analysis

Now that we have confirmed that the fraction of total reads from each phylum overlaps between shotgun metagenomics and the UA amplicon method in identical samples we can expand our analysis to test the UA primers of real samples collected from the North Imperial Valley of California.

Community Overview

First we will look at the relationship between all our samples using a Non-metric multidimensional scaling ordination of each sample.



## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.08649805 
## Run 1 stress 0.0864567 
## ... New best solution
## ... Procrustes: rmse 0.02521667  max resid 0.2535512 
## Run 2 stress 0.08649724 
## ... Procrustes: rmse 0.02489518  max resid 0.2489992 
## Run 3 stress 0.09493234 
## Run 4 stress 0.09206775 
## Run 5 stress 0.08658861 
## ... Procrustes: rmse 0.02306773  max resid 0.258522 
## Run 6 stress 0.08649739 
## ... Procrustes: rmse 0.02488475  max resid 0.248925 
## Run 7 stress 0.0864968 
## ... Procrustes: rmse 0.02493382  max resid 0.2495259 
## Run 8 stress 0.09020723 
## Run 9 stress 0.08628265 
## ... New best solution
## ... Procrustes: rmse 0.01137106  max resid 0.1236761 
## Run 10 stress 0.08649769 
## ... Procrustes: rmse 0.02278985  max resid 0.255077 
## Run 11 stress 0.0872873 
## Run 12 stress 0.08669207 
## ... Procrustes: rmse 0.02807586  max resid 0.2626092 
## Run 13 stress 0.09291213 
## Run 14 stress 0.09080042 
## Run 15 stress 0.09128063 
## Run 16 stress 0.09321587 
## Run 17 stress 0.08645556 
## ... Procrustes: rmse 0.01138764  max resid 0.1236963 
## Run 18 stress 0.09425672 
## Run 19 stress 0.08669289 
## ... Procrustes: rmse 0.02804728  max resid 0.2619809 
## Run 20 stress 0.0872889 
## *** No convergence -- monoMDS stopping criteria:
##      4: no. of iterations >= maxit
##     16: stress ratio > sratmax
Figure 3A: Non-metric multidimensional scaling ordination of each sample. Samples are colored by station ID.<br/><br/><br/><br/>

Figure 3A: Non-metric multidimensional scaling ordination of each sample. Samples are colored by station ID.




Figure A (Plotly 3D): Recreation of Figure 3A in 3D using NMDS axis 1, 2 and 3 as calculated via phyloseq ordinate



Next lets look at the Shannon Alpha diversity at each station as well as perform a rarefication analysis across sample sites using both observed and shannon diversity.



##      Depth                                                Sample     
##  Min.   :     1   SGI_V4.UA_run1_2018.05.11_IVF008_2018.03.22:  300  
##  1st Qu.:    10   SGI_UA_run27_2019.12.18_IVF008_20190905    :  260  
##  Median :  1000   SGI_UA_run5_2018.11.02_IVF006_2018.08.24   :  260  
##  Mean   : 11336   SGI_V4.UA_run1_2018.05.11_IVF006_2018.03.22:  260  
##  3rd Qu.: 20000   SGI_UA_run27_2019.12.18_IVF008_20191001    :  240  
##  Max.   :110000   SGI_UA_run27_2019.12.18_IVF006_20190905    :  220  
##                   (Other)                                    :15920  
##      Measure     Alpha_diversity   
##  Observed:8730   Min.   :   0.000  
##  Shannon :8730   1st Qu.:   3.030  
##                  Median :   5.115  
##                  Mean   : 136.275  
##                  3rd Qu.: 174.000  
##                  Max.   :1230.000  
## 
##      Depth           Sample             Measure     Alpha_diversity
##  Min.   :     1   Length:8730        Shannon:8730   Min.   :0.000  
##  1st Qu.:    10   Class :character                  1st Qu.:2.164  
##  Median :  1000   Mode  :character                  Median :3.947  
##  Mean   : 11336                                     Mean   :3.373  
##  3rd Qu.: 20000                                     3rd Qu.:4.726  
##  Max.   :110000                                     Max.   :6.268
<br/>Supplemental Figure 1: Rarefication analysis across sample sites with predicted,observed,and Shannon diversity calculated across sequencing depth. Points are colored by unique sampling date.<br/><br/><br/>


Supplemental Figure 1: Rarefication analysis across sample sites with predicted,observed,and Shannon diversity calculated across sequencing depth. Points are colored by unique sampling date.


<br/>Supplemental Figure 5: Shannon diversity(calculated using phyloseq’s plot_richness) function across all sample sites.


Supplemental Figure 5: Shannon diversity(calculated using phyloseq’s plot_richness) function across all sample sites.



Now we will look at phylum and kingdom level bar charts over time for each sample location. The two IVF locations from Alamo River and Managed Marsh have been combined for visualization.



Figure 3A: Bar 264plot of ASVs over time at each station agglomerated to the Phylum level. Vertical lines mark where specific sampling 265locations for the Alamo River and Managed Marsh changed due to previous locations becoming inaccessible.<br/><br/><br/><br/>

Figure 3A: Bar 264plot of ASVs over time at each station agglomerated to the Phylum level. Vertical lines mark where specific sampling 265locations for the Alamo River and Managed Marsh changed due to previous locations becoming inaccessible.



<br/>Supplemental Figure 3Bar plot of ASVs over time at each station agglomerated to the Kingdom leve


Supplemental Figure 3Bar plot of ASVs over time at each station agglomerated to the Kingdom leve



We can also produce a heatmap using the 75 most abundance Amplicon Sequence Variants (ASVs) agglomerated to the class levels and sorted by sample name showing clear differentiation between sites.



## NULL
Supplemental Figure 2: Heatmap of the top 75 ASVs agglomerated to theclass levelcolumns sorted by sample name.

Supplemental Figure 2: Heatmap of the top 75 ASVs agglomerated to theclass levelcolumns sorted by sample name.



Obsidian Butte Mar-June 2018

Zooming in on a select group of samples from March through June 2018 at Obsidian Butte shows some temporal differentiation.

Alpha diversity spikes in the May 15th samples.



Figure 4A: Seven Alpha diversity indexes shown side by side to assess evenness and richness of the community. X axis reflects 298dates of interest and y axis the calculated alpha diversity. The higher the value of y the more diverse the community is. The 299measures used include Observed diversity index, Chao1 diversity index, ACE diversity index, Shannon diversity index, Simpson 300diversity index, Inverse Simpson diversity index and Fisher diversity index.

Figure 4A: Seven Alpha diversity indexes shown side by side to assess evenness and richness of the community. X axis reflects 298dates of interest and y axis the calculated alpha diversity. The higher the value of y the more diverse the community is. The 299measures used include Observed diversity index, Chao1 diversity index, ACE diversity index, Shannon diversity index, Simpson 300diversity index, Inverse Simpson diversity index and Fisher diversity index.



Next we have bar charts at Obsidian Butte from March to June at the phylum level and lowest assigned.



Figure 4B: ASV abundance at Obsidian Butte from March through Juneof 2018. ASV abundances were agglomeratedto the phylum level.<br/><br/><br/>

Figure 4B: ASV abundance at Obsidian Butte from March through Juneof 2018. ASV abundances were agglomeratedto the phylum level.




Figure 4B: ASV abundance at Obsidian Butte from March through Juneof 2018. ASV abundances were agglomeratedto the phylum level.<br/><br/><br/>

Figure 4B: ASV abundance at Obsidian Butte from March through Juneof 2018. ASV abundances were agglomeratedto the phylum level.




Finally we zoom in on some manually curated unique facet groups to look at change over time at Obsidian Butte. These plots show Obsidian Butte Phylum level abundances from March through Juneof 2018 separated into unique groups.Groups were chosen based on known correlations between certain species in the literature and overlapping trends in abundances.



<br/>Supplemental Figure 6A: Phylum level Group 1 abundance from March to June 2018<br/><br/>


Supplemental Figure 6A: Phylum level Group 1 abundance from March to June 2018

<br/>Supplemental Figure 6B: Phylum level Group 2 abundance from March to June 2018


Supplemental Figure 6B: Phylum level Group 2 abundance from March to June 2018

<br/>Supplemental Figure 6C: Phylum level Group 3 abundance from March to June 2018


Supplemental Figure 6C: Phylum level Group 3 abundance from March to June 2018